Picture for Delin Qu

Delin Qu

Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning

Add code
May 26, 2026
Viaarxiv icon

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

Add code
May 14, 2026
Viaarxiv icon

Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

Add code
Dec 12, 2025
Viaarxiv icon

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Add code
Dec 11, 2025
Viaarxiv icon

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Add code
Sep 09, 2025
Figure 1 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Figure 2 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Figure 3 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Figure 4 for F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
Viaarxiv icon

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Add code
Aug 28, 2025
Viaarxiv icon

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Add code
May 29, 2025
Figure 1 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Figure 2 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Figure 3 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Figure 4 for Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Viaarxiv icon

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

Add code
May 27, 2025
Figure 1 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Figure 2 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Figure 3 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Figure 4 for Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Viaarxiv icon

Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation

Add code
Apr 01, 2025
Viaarxiv icon

Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Add code
Mar 11, 2025
Figure 1 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Figure 2 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Figure 3 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Figure 4 for Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models
Viaarxiv icon